Machine Learning Techniques 
(HERE BB AK) 





Lecture 14: Radial Basis Function Network 
Hsuan-Tien Lin (#4 #f 9) 
htlin@csie.ntu.edu.tw 
Department of Computer Science 
& Information Engineering 


National Taiwan University 
(AALGEASAALEA) 


^ 
ve 
М 5 
^ 


Hsuan-Tien Lin (NTU CSIE) 





0/24 


Radial Basis Function Network 


Roadmap 


@ Embedding Numerous Features: Kernel Models 
@ Combining Predictive Features: Aggregation Models 
Ө Distilling Implicit Features: Extraction Models 


Lecture 13: Deep Learning 


pre-training with denoising autoencoder 
(non-linear PCA) and fine-tuning with backprop 
for NNet with many layers 















Lecture 14: Radial Basis Function Network 
RBF Network Hypothesis 

RBF Network Learning 

k-Means Algorithm 

k-Means and RBF Network in Action 
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Radial Basis Function Network RBF Network Hypothesis 


Gaussian SVM Revisited 
Qsvu(X) = sign (= AnYn€Xp (—vlx E хл?) ar ») 


SV 





Gaussian SVM: find a, to combine Gaussians centered at x,,; 
achieve large margin in infinite-dimensional space, remember? :-) 





e Gaussian kernel: also called Radial Basis Function (RBF) kernel 


e radial: only depends on distance between x and ‘center x, 
e basis function: to be 'combined' 


• let gn(x) = ynexp (—7|х — Xnl|): 
gsvw(X) = sign (Sosy ong«;(X) + b) 
—linear aggregation of selected radial hypotheses 


Radial Basis Function (RBF) Network: 
linear aggregation of radial hypotheses | 
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Radial Basis Function Network RBF Network Hypothesis 


From Neural Network to RBF Network 


Neural Network RBF Network 





хак Dm 
X 
2 = centers _ E 


e hidden layer different: 
(inner-product + tanh) versus (distance + Gaussian) 


e output layer same: just linear aggregation 





RBF Network: historically a type of NNet | 
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Radial Basis Function Network RBF Network Hypothesis 


RBF Network Hypothesis 


RBF Network 


h(x) 


M 
= Output (x Bg RBF(X, и) + p) 


m=1 
key variables: 


centers pm; (signed) votes 8m 





Оѕум for Gaussian-SVM 
e RBF: Gaussian; Output: sign (binary classification) 
e M= #SV; um: SVMSVSXm; Bm: атут from SVM Dual 


learning: given RBF and Output, 
decide um and £5 
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Radial Basis Function Network RBF Network Hypothesis 


RBF and Similarity 
general similarity function between x and x’: 
Neuron(x, x’) = tanh(?x^ x' + 1) 
DNASim(x, x’) = EditDistance(x, x’) 


kernel: similarity via Z-space inner product 
—governed by Mercer's condition, remember? :-) 
Poly(x, x^) = (1 + x'x/? 


Gaussian(x, x’) = exp(—»|x — x' ||?) 


Tuncated(X, x’) = [1х — x'|| < 1] (1 — ||x — хр)? 
АВЕ: similarity via A-space distance 
—often monotonically non-increasing to distance 





RBF Network: distance similarity-to-centers as 
feature transform | 


Radial Basis Function Network RBF Network Hypothesis 


Fun Time 


Which of the following is not a radial basis function? 
: Ф(х, и) = ехр(—у|х — Ш?) 

O(X, и) = - /x!x – 2x! pt ин 

д(х, н) = [x = и] 

O(X, и) = x'x- u'u 








Radial Basis Function Network RBF Network Hypothesis 


Fun Time 


Which of the following is not a radial basis function? 





: Ф(х, ш) = ехр(—у|х — ul?) 
O(X, и) = - /x!x - 2x! pt ин 
ф(х, и) = [x = и] 

O(X, и) = x'x- u'u 












Reference Answer: (4) 


Note that (3) is an extreme case of (1) 
(Gaussian) with y — оо, and (2) contains an 
Ix — |2 somewhere :-). | 
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Radial Basis Function Network RBF Network Learning 


Full RBF Network 


M 
h(x) = Output (x BmRBF(x, ^) 


m=1 


e full ВВЕ Network: M = N and each um = Xm 
e physical meaning: each Xm influences similar х by 8m 
e e.g. uniform influence with 8m = 1 - Ym for binary classification 


N 
Quniform(X) = sign (Zro (lx s 0) 


m=" 


—aggregate each example’s opinion subject to similarity 





full АВЕ Network: lazy way to decide um | 
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Radial Basis Function Network RBF Network Learning 


Nearest Neighbor 


з (-01х – xml?) 


т=1 


допйогт (Ж) = sign | 


exp (—7||X — Xm||?): maximum when x closest to Xm 
—maximum one often dominates the yee term 


e take y of maximum exp(...) instead of voting of all ym 
—selection instead of aggregation 


physical meaning: 
Qnbor(X) = ym such that x closest to Xm 
—called nearest neighbor model 
can uniformly aggregate k neighbors also: k nearest neighbor 


k nearest neighbor: 
also lazy but very intuitive | 
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Radial Basis Function Network RBF Network Learning 


Interpolation by Full RBF Network 


full RBF Network for squared error regression: 


° just linear regression on RBF-transformed data 









N 
h(x) — Output n: BmRBF(X, Xm) 


m=1 





Zn = [RBF(Xn, X1), RBF(Xn, X2),..., RBF(Xn, хл)] 


e optimal 3? В = (Z'Z)-'Z’y, if Z'Z invertible, remember? :-) 
e size of Z? N (examples) by N (centers) 
—symmetric square matrix 


theoretical fact: if x, all different, Z with Gaussian RBF invertible 


optimal 3 with invertible Z: G = Z-!y | 
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Radial Basis Function Network RBF Network Learning 


Regularized Full RBF Network 


full Gaussian АВЕ Network for regression: 3 = Z-!y 
aes) ez cy msrcolmmorz) sy ОК 0 vay, 


—9Qrer (Xn) = Yn, і.е. Ein(gner) = 0, yeah!! :-) 





e called exact interpolation for function approximation 

e but overfitting for learning? :-( 

e how about regularization? e.g. ridge regression for 8 instead 
—optimal 8 = (272 + №)-177у 

seen Z? Z = [Gaussian(X,, Xm)] = Gaussian kernel matrix К 





effect of regularization in different spaces: 






kernel ridge regression: 8 = (К + AI)7!y; 
regularized full RBFNet: В = (Z'Z + А) 17. 7у 
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Radial Basis Function Network RBF Network Learning 


Fewer Centers as Regularization 
recall: 


gsvw(X) = sign (Sonne (—vIx = хл?) nb p) 


SV 


—only ‘< Л” SVs needed in ‘network’ 





e next: M < N instead of M = N 
e effect: regularization 


by constraining number of centers and voting weights 
e physical meaning of centers и: prototypes 





remaining question: 
how to extract prototypes? | 


Radial Basis Function Network RBF Network Learning 


Fun Time 


If X = х2, what happens іп the Z matrix of full Gaussian RBF network? 


@ the first two rows of the matrix are the same 
@ the first two columns of the matrix are different 
Ө the matrix is invertible 


© the sub-matrix at the intersection of the first two rows and the first 
two columns contains a constant of 0 





Radial Basis Function Network RBF Network Learning 


Fun Time 


If X4 = х2, what happens in the Z matrix of full Gaussian RBF network? 


@ the first two rows of the matrix are the same 
@ the first two columns of the matrix are different 
© the matrix is invertible 


@ the sub-matrix at the intersection of the first two rows and the first 
two columns contains a constant of 0 








Reference Answer: (1) 


It is easy to see that the first two rows must be 
the same; so must the first two columns. The 
two same rows makes the matrix singular; the 
sub-matrix in (4) contains a constant of 

1 = exp(-0) instead of 0. 
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Radial Basis Function Network k-Means Algorithm 
Good Prototvpes: Clusterina Problem 
if X4 z Xo, 
== no need both RBF(x, x1) & RBF(x, x2) in RBFNet, 
=> cluster x, and x» by one prototype u ~ х; ғ X2 






e clustering with prototype: 
e partition {x,} to disjoint sets S4, So,--- , Sy 
e choose pp for each Sm 


—hope: X41, X2 both € Sm & и X1 7 Xo 
e cluster error with squared error measure: 


N M 
1 
Ei( $1, -- ‚Әм; Н, „уй = 202009 [Xn € SalllXa hnll 


п=1 m=1 





goal: with S;,--- , Sm being a partition of {xp}, 






min Ei O17 SME 
{S1; Suias nu) int | им) 


Radial Basis Function Network k-Means Algorithm 
Partition Optimization 
with S;,--- , Sy being a partition of {xp}, 


N M 


p» 03 [Xn € Sm]llXn — „|? 


í Ши 
{ dantes Mi s BM) п=1 m=1 


e hard to optimize: joint combinatorial-numerical optimization 
e two sets of variables: will optimize alternatingly 





Ёр, рм fixed, for each Xn 
e [Xn € Sm]: choose one and only one subset 
* |х — Hml|?: distance to each prototype 








optimal chosen subset Sm = the one with minimum ||Xn — ull? 


for given риу, ·· · , шу, each Xn 
‘optimally partitioned’ using its closest um 





Radial Basis Function Network k-Means Algorithm 
Prototype Optimization 
with S;,--- , Sy being a partition of {xp}, 


N M 


NT x [Xn € Sm]llXn — „|? 


2 E 
1 qs et Mio y BM) п=1 т=1 


e hard to optimize: joint combinatorial-numerical optimization 
e two sets of variables: will optimize alternatingly 










if S;,--- , Sy fixed, just unconstrained optimization for each um 
N 

Vu, Ein = -2X [xn € Sm] (Xn — Hm) = — - °(( М J Б эч] 
n=1 ХО 





optimal prototype um = average of x, within Sm 


for given S;,--- , Sy, each up 
‘optimally computed’ as consensus within Sm 


Radial Basis Function Network k-Means Algorithm 


k-Means Algorithm 


use k prototypes instead of M historically 
(different from k nearest neighbor, though) | 
k-Means Algorithm 


Ө initialize 44, 42. .... и: Say, as k randomly chosen x; 
Ө alternating optimization of En: repeatedly 
[1) optimize S1, 52,..., Sx: 
each x, ‘optimally partitioned’ using its closest и; 
Ө optimize 14, ш›,..., Hg: 
each u, ‘optimally computed’ as consensus within Sm 


until converge 









converge: no change of S4, S5,..., 5 anymore 
—guaranteed as Ein decreases during alternating minimization 





k-Means: the most popular clustering 
algorithm through alternating minimization 
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Radial Basis Function Network k-Means Algorithm 


RBF Network Using k-Means 


RBF Network Using k-Means 
Ө run k-Means with К = M to get {um} 
Ө construct transform ®(x) from RBF (say, Gaussian) at um 












o (x) = [RBF(x, и), RBF(X, u2), ... , RBF(X, peyy)] 


Ө run linear model on (((X;), ул)} to get 3 
Ө return gasewer(X) = LinearHypothesis (8, Ф(х)) 


e using unsupervised learning (k-Means) to assist feature 
transform—like autoencoder 


e parameters: M (prototypes), RBF (such as y of Gaussian) 





RBF Network: a simple (old-fashioned) model | 
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Radial Basis Function Network k-Means Algorithm 


Fun Time 


For k-Means, consider examples x; є R? such that all Xn,1 and хп are 
non-zero. When fixing two prototypes ш; = [1,1] and us = [-1, 1], 
which of the following set is the optimal S4? 

@ {Xn: Хп > 0} 
Ө (xs: Xn1 < 0} 
Ө {Xn: Xn2 > 0} 
o {Xn: Xn,2 < 0} 





Radial Basis Function Network 


For k-Means, consider examples x; є RÊ such that all Xn,1 and X5» are 
non-zero. When fixing two prototypes ш; = [1,1] and us = [-1, 1], 
which of the following set is the optimal S4? 
© {х,: 
Ө {х,: 
Ө {х„: 
Ө {х„: 


Xn1 > 0} 
Xn1 < 0} 
Xn2 > 0} 
Xn2 < 0} 


k-Means Algorithm 


Fun Time 








Reference Answer: (1) 


Note that S4 contains examples that are closer | 


to u4 than po. 
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Radial Basis Function Network k-Means and RBF Network in Action 


Beauty of k-Means 


К=4 
iteration 0 


usually works well 
with proper К and initialization | 


Radial Basis Function Network k-Means and RBF Network in Action 


Beauty of k-Means 


k=4 
iteration 1 


usually works well 
with proper К and initialization | 


Radial Basis Function Network k-Means and RBF Network in Action 


Beauty of k-Means 


k=4 
iteration 1 


usually works well 
with proper К and initialization | 


Radial Basis Function Network k-Means and RBF Network in Action 


Beauty of k-Means 


k=4 
iteration 2 


usually works well 
with proper К and initialization | 


Radial Basis Function Network k-Means and RBF Network in Action 


Beauty of k-Means 


k=4 
iteration 2 


usually works well 
with proper К and initialization | 


Radial Basis Function Network k-Means and RBF Network in Action 


Beauty of k-Means 


k=4 
iteration 3 
. tá ee 
“Ae, 
. з S 
d um 
wm. A га we 
Pee, ш). "psi 
MES 
pst 


usually works well 
with proper К and initialization | 


Radial Basis Function Network k-Means and RBF Network in Action 


Beauty of k-Means 


k=4 
iteration 3 
. ^. ee 
tabes е 
wees . bs 
wes LU 
"e ei 
eee, " x . 
е Ка. 
f 


usually works well 
with proper К and initialization | 


Radial Basis Function Network k-Means and RBF Network in Action 


Beauty of k-Means 


k=4 
iteration 4 


usually works well 
with proper К and initialization | 


Radial Basis Function Network k-Means and RBF Network in Action 
Beauty of k-Means 
k=4 
iteration 4 


. 2 * w. 
ve ы Ы 
WO ЕУ 
eee. ee + а 

. e "i ^ . 
* $^ fae 
Est 


usually works well 
with proper К and initialization | 


Radial Basis Function Network k-Means and RBF Network in Action 
Beauty of k-Means 
k=4 
iteration 5 


. 2 Ы w. 
Рә . 
MS : TE 
e "e ei 
eee, = ay LIA 
A BA 
pst 


usually works well 
with proper К and initialization | 


Radial Basis Function Network k-Means and RBF Network in Action 


Beauty of k-Means 


k=4 
iteration 5 
. ^» ee 
ee 
E e 
ge LL . ^t em 
m. b 
s s А ма ын EC. 
"e. „э? 4 BU. 
= 5 
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usually works well 
with proper К and initialization | 


Radial Basis Function Network k-Means and RBF Network in Action 


Beauty of k-Means 


k=4 
iteration 6 


usually works well 
with proper К and initialization | 


Radial Basis Function Network k-Means and RBF Network in Action 


Beauty of k-Means 


k=4 
iteration 6 
. ^.» ee 
de ` 


usually works well 
with proper k and initialization | 


Radial Basis Function Network 


k-Means and RBF Network in Action 


Difficulty of k-Means 











k=2 k=4 
iteration 0 iteration 0 
. ^.» oe в” 
ke . i^ . 
“> B "27 ш 
БОТАСЫ DIO T Du odi 
AIT ЫҢ wh cay 
A2 oo 88 52" ІУ 
Ab zt "ыз "DN 
оц: o eMe e 
Жу. S 04 
s i: 














‘sensitive’ to k and initialization | 
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Radial Basis Function Network 


k-Means and RBF Network in Action 


Difficulty of k-Means 











k=2 k=4 
iteration 9 iteration 7 
. ^.» ee . ^.» oe 
Ам А. 
T o. s c 
д^ A t NS, "uh mbt y t6 AW 
©. шу | MEM sas 
E d п, z: ye CE EL: Es inet 
шс “> Я . DUE A Ый 
us See . Ж... 














'sensitive' їо К and initialization | 
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Radial Basis Function Network k-Means and RBF Network in Action 


RBF Network Using k-Means 


























reasonable performance 
with proper centers 
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Radial Basis Function Network k-Means and RBF Network in Action 


Full RBF Network 





k=4 nearest neighbor 


























full RBF Network: generally less useful ] 
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Radial Basis Function Network k-Means and RBF Network in Action 


Fun Time 


When coupled with ridge linear regression, which of the following RBF 
Network is ‘most regularized’? 

@ small M and small A 

Ө small M and large А 

Ө large M and small A 

Ө large M and large ^ 





k-Means and RBF Network in Action 


Fun Time 


Radial Basis Function Network 


When coupled with ridge linear regression, which of the following RBF 
Network is ‘most regularized’? 

@ small M and small A 

Ө small M and large А 

Ө large M and small A 

Ө large M and large А | 








Reference Answer: Ө 


small M: fewer weights and more regularized; 
large A: shorter 8 more and more regularized. | 
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Summary 


@ Embedding Numerous Features: Kernel Models 
@ Combining Predictive Features: Aggregation Models 
Ө Distilling Implicit Features: Extraction Models 


Lecture 14: Radial Basis Function Network 
e RBF Network Hypothesis 
prototypes instead of neurons as transform 
e RBF Network Learning 
linear aggregation of prototype ‘hypotheses’ 
e k-Means Algorithm 
clustering with alternating optimization 
e k-Means and RBF Network in Action 
proper choice of # prototypes important 





e next: extracting features from abstract data 





